How often is each gene mutated within the cancer patient population?

ABSTRACT Genome sequenced samples from cancer patients helped identify roles of different mutation types and enabled targeted therapy development. However, critical questions like what are the gene mutation rates among the patients? or what genes are most commonly mutated, pan-cancer? have only been recently answered. Here, we highlight this recent advance.

Accessibility to genomic data as a result of technological advances gave rise to a better understanding of the roles that specific gene mutations play in the development and progression of specific types of cancer, as well as for cancer in general. As a consequence, the advent of cancer genomics presented the welcomed promise of both increased understanding of the cellular and molecular mechanisms which underlie the causes of cancer; and, importantly, the information identifying targets for personalized therapies that could substantially reduce the mortality and morbidity burdens of this devastating disease. However, some very basic questions cannot be addressed by these data, such as: Within the cancer patient population, how often is any given gene likely to be mutated? What are the most commonly mutated genes across all cancer patients? Presumably, targets for personalized therapies could be better prioritized with a better understanding of their true incidence in the patient population.
Currently available pooled and pan-cancer resources 1-4 cannot answer the aforementioned questions because the number of cancer samples in those resources are not proportionate to the actual incidence of those cancers ( Figure 1). To overcome this problem, we introduced a framework that performs a post-hoc analysis of available cancer genomic data and integrates the sequencing data with cancer epidemiological data to remove some of the sample bias from the genomic sequencing studies. This allowed us to calculate mutation frequency estimates that better represent the cancer patient population of the United States. 5 Mathematically, we calculated the "weighted average" by multiplying the rate of each gene mutation within each form of cancer by the proportion of all cancers that is accounted for by that specific form of cancer. However, accomplishing this undertaking was complicated by the fact that cancer genomics and cancer epidemiology utilize distinct naming systems for the same type of cancer. To overcome this, we created a new scheme for the reclassification of sequencing and epidemiological tumor type annotations (ROSETTA). With ROSETTA, 93% of sequenced cancer samples were mapped to an appropriate equivalent cancer type from the epidemiological data obtained from NCI SEER survey.
We took exome data from 139 different studies. These studies included 19,181 different samples. We processed these data with ROSETTA. Importantly, of the 22,730 different samples, we found 3,549 were redundant (same sample included in one or more studies or longitudinal samples for the same patient) and we excluded redundant/repeat samples from our analysis. We also excluded sequenced cell lines and xenografts due to the potential for specific mutations to influence the ability of a cancer cell to propagate in either condition. We then processed cancer epidemiology data from more than 7 million patients through ROSETTA to obtain our estimates for the proportion of all cancers due to each category of cancer. Once both types of data were processed, we computationally integrated the data. We also calculated 95% confidence intervals by generating Poisson-distributed in-silico genomic studies for each gene and ROSETTA cancer type, which were then processed through the reweighting pipeline. Software for our analysis is publicly available via (GitHub: https://github.com/ GMendiratta/ROSETTA-for-Cancer-Mutations).
Our analysis yielded several surprising findings, including that oncogenic driver genes were far less common than previously thought (Figure 1). PIK3CA is found to be the most commonly mutated proto-oncogene, being mutated in 13% of cancers. This is notable, as KRAS is often stated to be the most commonly mutated proto-oncogene. 6 Our study found KRAS mutated in 11% of all cancers as opposed to the 30% mutation rate often quoted, including in the recent literature. 7, 8 We also report that a major effector of Ras proteins, encoded by the gene BRAF has a mutation incidence of 8%, almost as common pan-cancer as KRAS. One important implication of our study is that it highlights how cancer researchers have held inaccurate perceptions as to how commonly specific genes are mutated in cancer. The belief that KRAS, NRAS, and HRAS mutations are found in nearly one-third of all cancers helped motivate the creation of high-profile major research programs that focused on RAS ,9 but our study found that only approximately 15% of all cancer patients have KRAS, NRAS, and/or HRAS mutation. It may be prudent to evaluate whether resources should be distributed to other high-mutation prevalence genes in a more equitable manner.
The mutant proteins encoded by mutated genes have been important drug targets for cancer. Our study suggests that there are few genes that are found mutated in a large fraction of cancer patients. However, more than 1.9 million new cancer diagnoses are projected for 2022 10 and a gene found in 5% (or 1%) of patients would affect nearly 95,000 (or 19,000) patients every year! Our study finds 122 (incidence >5.00%) or 5,577 (incidence >1.00%) such genes exist. Thus, although our study suggests the road forward for personalized cancer medicine may be less direct, it also suggests that there are many possible paths forward.

Disclosure statement
No potential conflicts of interest were disclosed.

Funding
This work was was supported by the National Institutes of Health DP2AT011327 (E.C.S).